Goto

Collaborating Authors

 age and gender


General Demographic Foundation Models for Enhancing Predictive Performance Across Diseases and Populations

Chen, Li-Chin, Sheu, Ji-Tian, Chuang, Yuh-Jue

arXiv.org Artificial Intelligence

Demographic attributes are universally present in electronic health records. They are the most widespread information across populations and diseases, and serve as vital predictors in clinical risk stratification and treatment decisions. Despite their significance, these attributes are often treated as auxiliaries in model design, with limited attention being paid to learning their representations. This study explored the development of a General Demographic Pre-trained (GDP) model as a foundational model tailored to demographic attributes, focusing on age and gender. The model is pre-trained and evaluated using datasets with diverse diseases and populations compositions from different geographic regions. The composition of GDP architecture was explored through examining combinations of ordering approaches and encoding methods to transform tabular demographic inputs into effective latent embeddings. Results demonstrate the feasibility of GDP to generalize across task, diseases, and populations. In detailed composition, the sequential ordering substantially improves model performance in discrimination, calibration, and the corresponding information gain at each decision tree split, particularly in diseases where age and gender contribute significantly to risk stratification. Even in datasets where demographic attributes hold relatively low predictive value, GDP enhances the representational importance, increasing their influence in downstream gradient boosting models. The findings suggest that foundation models for tabular demographic attributes offer a promising direction for improving predictive performance in healthcare applications.


Trial-Level Time-frequency EEG Desynchronization as a Neural Marker of Pain

Blanco-Mora, D. A., Dierolf, A., Gonçalves, J., van Der Meulen, M.

arXiv.org Artificial Intelligence

Pain remains one of the most pressing health challenges, yet its measurement still relies heavily on self-report, limiting monitoring in non-communicative patients and hindering translational research. Neural oscillations recorded with electroencephalography (EEG) provide a promising avenue for identifying reproducible markers of nociceptive processing. Prior studies have reported pain-related event-related desynchronization (ERD) in the alpha and beta bands, but most rely on trial-averaging, obscuring variability that may be critical for perception. We analyzed high-density EEG from 59 healthy participants who underwent electrical stimulation under Pain and No-Pain conditions. Per-trial time-frequency decomposition revealed robust beta-band ERD in frontal-central electrodes that differentiated Pain from No-Pain trials. Generalized linear mixed models demonstrated that ERD scaled with subjective intensity ratings (VAS), and that age and gender moderated this relationship. Reverse models further showed that ERD predicted VAS ratings across participants, underscoring its potential as a nonverbal marker of pain. These findings provide preliminary evidence that trial-level EEG oscillations can serve as reliable indicators of pain and open avenues for individualized, report-free pain monitoring. Future work should validate these results in patient populations and extend analyses to multimodal approaches combining EEG, MRI, and attention-based modulation strategies.


Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection

Hu, Juan, Fan, Shaojing, Sim, Terence

arXiv.org Artificial Intelligence

Multi-face deepfake videos are becoming increasingly prevalent, often appearing in natural social settings that challenge existing detection methods. Most current approaches excel at single-face detection but struggle in multi-face scenarios, due to a lack of awareness of crucial contextual cues. In this work, we develop a novel approach that leverages human cognition to analyze and defend against multi-face deepfake videos. Through a series of human studies, we systematically examine how people detect deepfake faces in social settings. Our quantitative analysis reveals four key cues humans rely on: scene-motion coherence, inter-face appearance compatibility, interpersonal gaze alignment, and face-body consistency. Guided by these insights, we introduce \textsf{HICOM}, a novel framework designed to detect every fake face in multi-face scenarios. Extensive experiments on benchmark datasets show that \textsf{HICOM} improves average accuracy by 3.3\% in in-dataset detection and 2.8\% under real-world perturbations. Moreover, it outperforms existing methods by 5.8\% on unseen datasets, demonstrating the generalization of human-inspired cues. \textsf{HICOM} further enhances interpretability by incorporating an LLM to provide human-readable explanations, making detection results more transparent and convincing. Our work sheds light on involving human factors to enhance defense against deepfakes.


Deep Learning based approach to detect Customer Age, Gender and Expression in Surveillance Video

Ijjina, Earnest Paul, Kanahasabai, Goutham, Joshi, Aniruddha Srinivas

arXiv.org Artificial Intelligence

In the current information era, customer analytics play a key role in the success of any business. Since customer demographics primarily dictate their preferences, identification and utilization of age & gender information of customers in sales forecasting, may maximize retail sales. In this work, we propose a computer vision based approach to age and gender prediction in surveillance video. The proposed approach leverage the effectiveness of Wide Residual Networks and Xception deep learning models to predict age and gender demographics of the consumers. The proposed approach is designed to work with raw video captured in a typical CCTV video surveillance system. The effectiveness of the proposed approach is evaluated on real-life garment store surveillance video, which is captured by low resolution camera, under non-uniform illumination, with occlusions due to crowding, and environmental noise. The system can also detect customer facial expressions during purchase in addition to demographics, that can be utilized to devise effective marketing strategies for their customer base, to maximize sales.


Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification

Sun, Xiangyu, Zou, Xiaoguang, Wu, Yuanquan, Wang, Guotai, Zhang, Shaoting

arXiv.org Artificial Intelligence

X-ray imaging is pivotal in medical diagnostics, offering non-invasive insights into a range of health conditions. Recently, vision-language models, such as the Contrastive Language-Image Pretraining (CLIP) model, have demonstrated potential in improving diagnostic accuracy by leveraging large-scale image-text datasets. However, since CLIP was not initially designed for medical images, several CLIP-like models trained specifically on medical images have been developed. Despite their enhanced performance, issues of fairness - particularly regarding demographic attributes - remain largely unaddressed. In this study, we perform a comprehensive fairness analysis of CLIP-like models applied to X-ray image classification. We assess their performance and fairness across diverse patient demographics and disease categories using zero-shot inference and various fine-tuning techniques, including Linear Probing, Multilayer Perceptron (MLP), Low-Rank Adaptation (LoRA), and full fine-tuning. Our results indicate that while fine-tuning improves model accuracy, fairness concerns persist, highlighting the need for further fairness interventions in these foundational models.


What's YOUR colour IQ? Take the test to see how your perception of different shades compares to other people your age

Daily Mail - Science & tech

Anyone who's ever stared in desperation at a paint colour chart will know that telling shades apart is not always the easiest task. Due to our biological differences, some people seem to have no trouble separating the subtlest of tones, while others find it tricky to find a matching pair of socks. If you've ever wondered where you fall on this colour spectrum, a new test will reveal how you stack up against your peers. So, what's your colour IQ? Take the test at this link to find out. The test, created by X-rite Pantone, is a simplified version of something called the Farnsworth Munsell 100 Hue Test which was developed in the 1940s by a scientist called Dean Farnsworth.


Using Backbone Foundation Model for Evaluating Fairness in Chest Radiography Without Demographic Data

Queiroz, Dilermando, Anjos, André, Berton, Lilian

arXiv.org Artificial Intelligence

Ensuring consistent performance across diverse populations and incorporating fairness into machine learning models are crucial for advancing medical image diagnostics and promoting equitable healthcare. However, many databases do not provide protected attributes or contain unbalanced representations of demographic groups, complicating the evaluation of model performance across different demographics and the application of bias mitigation techniques that rely on these attributes. This study aims to investigate the effectiveness of using the backbone of Foundation Models as an embedding extractor for creating groups that represent protected attributes, such as gender and age. We propose utilizing these groups in different stages of bias mitigation, including pre-processing, in-processing, and evaluation. Using databases in and out-of-distribution scenarios, it is possible to identify that the method can create groups that represent gender in both databases and reduce in 4.44% the difference between the gender attribute in-distribution and 6.16% in out-of-distribution. However, the model lacks robustness in handling age attributes, underscoring the need for more fundamentally fair and robust Foundation models. These findings suggest a role in promoting fairness assessment in scenarios where we lack knowledge of attributes, contributing to the development of more equitable medical diagnostics.


A Framework For Gait-Based User Demography Estimation Using Inertial Sensors

Swami, Chinmay Prakash

arXiv.org Artificial Intelligence

Human gait has been shown to provide crucial motion cues for various applications. Recognizing patterns in human gait has been widely adopted in various application areas such as security, virtual reality gaming, medical rehabilitation, and ailment identification. Furthermore, wearable inertial sensors have been widely used for not only recording gait but also to predict users' demography. Machine Learning techniques such as deep learning, combined with inertial sensor signals, have shown promising results in recognizing patterns in human gait and estimate users' demography. However, the black-box nature of such deep learning models hinders the researchers from uncovering the reasons behind the model's predictions. Therefore, we propose leveraging deep learning and Layer-Wise Relevance Propagation (LRP) to identify the important variables that play a vital role in identifying the users' demography such as age and gender. To assess the efficacy of this approach we train a deep neural network model on a large sensor-based gait dataset consisting of 745 subjects to identify users' age and gender. Using LRP we identify the variables relevant for characterizing the gait patterns. Thus, we enable interpretation of non-linear ML models which are experts in identifying the users' demography based on inertial signals. We believe this approach can not only provide clinicians information about the gait parameters relevant to age and gender but also can be expanded to analyze and diagnose gait disorders.


Character-based Outfit Generation with Vision-augmented Style Extraction via LLMs

Forouzandehmehr, Najmeh, Cao, Yijie, Thakurdesai, Nikhil, Giahi, Ramin, Ma, Luyi, Farrokhsiar, Nima, Xu, Jianpeng, Korpeoglu, Evren, Achan, Kannan

arXiv.org Artificial Intelligence

The outfit generation problem involves recommending a complete outfit to a user based on their interests. Existing approaches focus on recommending items based on anchor items or specific query styles but do not consider customer interests in famous characters from movie, social media, etc. In this paper, we define a new Character-based Outfit Generation (COG) problem, designed to accurately interpret character information and generate complete outfit sets according to customer specifications such as age and gender. To tackle this problem, we propose a novel framework LVA-COG that leverages Large Language Models (LLMs) to extract insights from customer interests (e.g., character information) and employ prompt engineering techniques for accurate understanding of customer preferences. Additionally, we incorporate text-to-image models to enhance the visual understanding and generation (factual or counterfactual) of cohesive outfits. Our framework integrates LLMs with text-to-image models and improves the customer's approach to fashion by generating personalized recommendations. With experiments and case studies, we demonstrate the effectiveness of our solution from multiple dimensions.


HyMNet: a Multimodal Deep Learning System for Hypertension Classification using Fundus Photographs and Cardiometabolic Risk Factors

Baharoon, Mohammed, Almatar, Hessa, Alduhayan, Reema, Aldebasi, Tariq, Alahmadi, Badr, Bokhari, Yahya, Alawad, Mohammed, Almazroa, Ahmed, Aljouie, Abdulrhman

arXiv.org Artificial Intelligence

In recent years, deep learning has shown promise in predicting hypertension (HTN) from fundus images. However, most prior research has primarily focused on analyzing a single type of data, which may not capture the full complexity of HTN risk. To address this limitation, this study introduces a multimodal deep learning (MMDL) system, dubbed HyMNet, which combines fundus images and cardiometabolic risk factors, specifically age and gender, to improve hypertension detection capabilities. Our MMDL system uses the DenseNet-201 architecture, pre-trained on ImageNet, for the fundus imaging path and a fully connected neural network for the age and gender path. The two paths are jointly trained by concatenating 64 features output from each path that are then fed into a fusion network. The system was trained on 1,143 retinal images from 626 individuals collected from the Saudi Ministry of National Guard Health Affairs. The results show that the multimodal model that integrates fundus images along with age and gender achieved an AUC of 0.791 [CI: 0.735, 0.848], which outperforms the unimodal model trained solely on fundus photographs that yielded an AUC of 0.766 [CI: 0.705, 0.828] for hypertension detection. Abbreviations BP, blood pressure; CVD, cardiovascular disease; EHR, electronic health record; EMR, electronic medical records; AI, artificial intelligence; DL, deep learning; MMDL, multimodal deep learning; SVM, support vector machine; FCNN, fully connected neural network; CNN convolutional neural network; ReLU; rectified linear unit; AUC, area under the operating characteristic curve, PR, area under the precision-recall curve; CI, confidence interval; MAE, mean absolute error; KAIMRC, King Abdullah International Medical Research Center. Keywords Artificial Intelligence; Machine Learning; Computer Vision. 1. Introduction Cardiovascular diseases persist as one of the primary causes of mortality worldwide, with hypertension, or high blood pressure (BP), serving as a significant contributing risk factor (1,2).